TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling
نویسندگان
چکیده
We present a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation. Our simple yet effective approach harvests hypernyms with substring inclusion and Hearst-style lexicosyntactic patterns from domain-specific texts obtained via language model based focused crawling. Extracted taxonomies are evaluated on English, Dutch, French and Italian for three domains each (Food, Environment and Science). Evaluations against a gold standard and by human judgment show that our method outperforms more complex and knowledge-rich approaches on most domains and languages. Furthermore, to adapt the method to a new domain or language, only a small amount of manual labour is needed.
منابع مشابه
LT3: A Multi-modular Approach to Automatic Taxonomy Construction
This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extraction Evaluation”. We propose a hypernym detection system combining three modules: a lexico-syntactic pattern matcher, a morphosyntactic analyzer and a module retrieving hypernym relations from structured lexical resources. Our system ranked first in the competition when considering the gold standard and manual ...
متن کاملA Metric-based Framework for Automatic Taxonomy Induction
This paper presents a novel metric-based framework for the task of automatic taxonomy induction. The framework incrementally clusters terms based on ontology metric, a score indicating semantic distance; and transforms the task into a multi-criteria optimization based on minimization of taxonomy structures and modeling of term abstractness. It combines the strengths of both lexico-syntactic pat...
متن کاملUSAAR at SemEval-2016 Task 13: Hyponym Endocentricity
This paper describes our submission to the SemEval-2016 Taxonomy Extraction Evaluation (TExEval-2) Task. We examine the endocentric nature of hyponyms and propose a simple rule-based method to identify hypernyms at high precision. For the food domain, we extract lists of terms from the Wikipedia lists of lists by using the name of each list as the endocentric head and treating all terms in the ...
متن کاملSemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)
This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dut...
متن کاملQASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition
This paper presents our participation to the SemEval “Task 13: Taxonomy Extraction Evaluation (TExEval-2)” (Bordea et al., 2016). This year, we propose the combination of recent semantic vectors representation into a methodology for semisupervised and auto-supervised acquisition of lexical taxonomies from raw texts. In our proposal, first similarities between concepts are calculated using seman...
متن کامل